Skip to content

AEP v2.0.0 — autonomy loop (G2–G7 + full_auto)#11

Merged
memorysaver merged 8 commits into
mainfrom
feat/aep-v2-autonomy
Jun 15, 2026
Merged

AEP v2.0.0 — autonomy loop (G2–G7 + full_auto)#11
memorysaver merged 8 commits into
mainfrom
feat/aep-v2-autonomy

Conversation

@memorysaver

Copy link
Copy Markdown
Owner

Summary

Implements the loop-engineering autonomy gaps from docs/research/loop-engineering-autonomy-gap.md (G2–G7) plus the A1 full_auto master switch. Builds on v1.8.0 (claude-team removed → native-bg-subagent default + post-spawn liveness probe). Every new capability defaults to human-in-the-loop; autonomy is opt-in via topology.routing flags. Version bumped to 2.0.0.

Research + design lineage is included in the branch (docs/research/loop-engineering-autonomy-gap.md, docs/research/g4-dogfood-validation-design.md) and decisions were taken interactively (G1 rejected; full-auto kept opt-in; grouped_change kept as the one documented exception to one-subagent-per-story).

What's in it

Gap Change
G2 Change-strategy recovery ladder (gen-eval/references/recovery-ladder.md) wired into build Phase 5 + autopilot tick ④ — same-fix → re-ground → fresh generator → decompose before the human gate.
G3 Visual Design evaluator dimension (gen-eval scoring + evaluator contract); multimodal on both hosts.
G4a Post-merge guard (autopilot/references/post-merge-guard.md + tick ③.5): deploy-health monitoring, conservative auto_revert (default off).
G4b Host-aware dogfood (executor/references/dogfood-validation.md): Claude→agent-browser, Codex→native browser/computer-use or Playwright; config-first URL with CI fallback.
G5 Telemetry-driven reflect (reflect/references/telemetry-ingestion.md): auto-ingestion + quantitative outcome auto-eval.
G6 New /aep-watch skill — self-feeding work discovery (registered in marketplace.json).
G7 Loop hygiene unified on --max-turns.
A1 topology.routing.full_auto (default false) master switch over the strategic human gates.

Safety posture

  • Defaults preserve current human-gated behavior everywhere; full_auto / auto_revert / auto_outcome_eval / watch.auto_create are explicit opt-ins.
  • Orchestrator boundary intact (signals/CI/gh only; no workspace-code reads, no gh pr merge from main).
  • native-bg-subagent + mandatory post-spawn liveness probe on every spawn path; one-launch=one-subagent=one-story invariant explicit (grouped_change is the documented exception).

Process

Built via parallel sub-agents (new files + per-file wiring), then a design-review subagent pass; its findings (1 blocker + 5 should-fix + nits) are all addressed in fix(aep-v2): address design review — notably authoring telemetry-ingestion.md in the canonical _shared/references/ (the build-generated copy had been wiped), plus state-schema guard_state / escalation-enum / recovery_rung registration and doc-count fixes.

Verification

  • bash scripts/build-skills.sh --check → in sync
  • product-context schema + marketplace.json parse clean
  • lefthook (oxlint/oxfmt/skills-build) green on every commit

🤖 Generated with Claude Code

memorysaver and others added 8 commits June 15, 2026 23:54
Web research on loop engineering (5 building blocks, ReAct, Ralph loop)
mapped against current AEP workflow. Scorecard + gap classification
(G1 fresh-context, G2 recovery ladder, G4 post-merge guard, G5 telemetry
reflect, G6 self-feeding discovery, G7 hygiene) with priority ordering.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All 7 gap-fill methods (G1-G7) confirmed cross-host compatible via the
executor abstraction. Resolved two caveats: G3 visual evaluator (Codex
confirmed multimodal), G7 unifies on --max-turns (drop codex-only
token_budget as primary). G1 standardizes on exec/headless one-shot
per task to avoid nesting limits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spawn granularity in AEP is the story (one worker per story per round);
deliberately not subdividing into per-task fresh contexts. G1 moved to a
"Rejected" record with rationale; scorecard, gap buckets, priority, and
compatibility tables updated. Gaps now G2-G7 (6 methods).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Post-deploy staging/prod validation with host-aware method selection:
Claude Code auto-detects agent-browser; Codex uses native in-app
browser+computer-use (desktop) or Playwright scripts (headless codex-exec,
since computer-use is desktop-only). URL resolution = config first,
CI fallback. Integration: upgrade Phase 6 + new post-deploy step.
Issues auto-create stories via reflect classifier (links G6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ood, telemetry reflect, self-feeding watch, visual eval, full-auto switch

Implements the retained loop-engineering gaps (G2–G7) plus the A1 full-auto
master switch, all defaulting to human-in-the-loop (opt-in only).

- G2 recovery ladder: gen-eval/references/recovery-ladder.md; build Phase 5 and
  autopilot tick ④ climb same-fix → re-ground → fresh native-bg-subagent →
  decompose before the eval_not_converging human gate.
- G4 host-aware dogfood + post-merge guard: executor/references/dogfood-validation.md
  (dogfood_method()/target_url(), Claude=agent-browser, Codex=native/Playwright),
  autopilot/references/post-merge-guard.md + tick Step ③.5; build Phase 6 host-aware;
  on-issue → reflect story; hard regression → conservative auto_revert (default off).
- G5 telemetry reflect: reflect/references/telemetry-ingestion.md; reflect Step 1
  auto-ingestion + Step 2.75 quantitative outcome auto-eval; tick layer-completion.
- G6 self-feeding discovery: new /aep-watch skill (registered in marketplace.json).
- G3 visual evaluator: Visual Design dimension in gen-eval scoring + evaluator contract.
- G7 loop hygiene: unified --max-turns budget; cap = possibly-unsolvable.
- A1 full_auto master switch (default false) gates strategic pauses; config keys
  added to product-context schema (all 3 templates). Quick-reference updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- B1: author telemetry-ingestion.md in canonical _shared/references/ (was created
  in a build-generated dir and wiped by build-skills.sh); rebuild materializes it
  into reflect/ + watch/ — G5 + watch ingestion now resolve.
- S1: add guard_state entry to autopilot state-schema (post-merge-guard idempotency).
- S3: register post_merge_regression in the escalation type enum.
- S2: document recovery_rung in eval-protocol status.json fields.
- S4: schema health_signals example ci → ci_status (matches the guard's key).
- S5: skill count 16 → 17 in README + orientation; add /aep-watch to orientation table.
- N1: brief Codex dogfood recipe pointer in codex-native.md.
- oxfmt markdown reformatting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…coverage guard)

Closes the v2 telemetry gap: consumers shipped without a way to decide/wire sources.

- Coverage rule + coverage_check() helper in telemetry-ingestion.md (canonical
  _shared/references): a source is needed iff a quantitative success_metric or
  health_signal requires it.
- /aep-map gains a Telemetry Binding step (the decision owner): bind each needed
  signal to a detected/declared source via metric_map; flag the unmeasurable.
- /aep-scaffold audit detects the observability stack (Sentry/Datadog/PostHog/
  OTel/health endpoint) → candidate telemetry_sources.
- /aep-watch (Step 0 precondition), /aep-reflect Step 2.75, and post-merge guard
  run coverage_check() and BLOCK the auto path when the map binding is incomplete
  ("run /aep-map observability step") — never silently no-op.
- schema documents telemetry_sources[].metric_map + the coverage rule.

Folded into the unreleased v2.0.0 (PR #11). oxfmt + build-skills in sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@memorysaver

Copy link
Copy Markdown
Owner Author

Added: telemetry source determination (commit 94c76a2)

Closes the gap where v2's telemetry consumers (G5 reflect auto-eval, G6 /aep-watch, G4a post-merge guard) shipped with no way for a project to decide/wire telemetry_sources.

  • Hybrid rule (telemetry-ingestion.md §1.5): a source is needed iff some quantitative success_metric or health_signal requires it.
  • /aep-scaffold audit detects the observability stack → candidate sources.
  • /aep-map gains a Telemetry Binding step (the decision owner) — binds each needed signal to a source via metric_map; flags the unmeasurable.
  • Shared coverage_check()/aep-watch (Step 0), /aep-reflect Step 2.75, and the post-merge guard block the auto path when the binding is incomplete ("run /aep-map observability step") — never silently no-op.

Folded into the unreleased v2.0.0. Passed a focused design-review (no blockers; fixed a dangling /aep-onboard detection claim — onboard is tooling-only).

@memorysaver memorysaver merged commit 071f98c into main Jun 15, 2026
2 checks passed
@memorysaver memorysaver deleted the feat/aep-v2-autonomy branch June 15, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant